perm filename IIA1.PUB[NSF,MUS]1 blob
sn#096539 filedate 1974-04-10 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00014 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 .SELECT B
C00004 00003 .SELECT B
C00013 00004 .BEGIN FILL ADJUST
C00019 00005 .SELECT A
C00028 00006 .BEGIN FILL ADJUST
C00033 00007 .GROUP SKIP 2
C00038 00008 .SELECT 5
C00042 00009 .SELECT 5
C00048 00010 .GROUP SKIP 2
C00054 00011 .NEXT PAGE
C00058 00012 .SELECT 5
C00065 00013 .SELECT 5
C00067 00014 .SELECT 5
C00071 ENDMK
C⊗;
.SELECT B
.BEGIN CENTER
II. RESEARCH PROPOSAL
.END
.GROUP SKIP 2
.SELECT 5
note to the reader
.SELECT 1
.BEGIN FILL ADJUST
In the following section we present the nature of our current and
proposed research. Because of the scope of our research, this
presentation is necessarily lengthy. To aid the reader in obtaining
an overview of this material, we have adopted the following
convention in this section. Introductory and summary materials,
including references to pertinent Figures and Recorded Examples, are
presented at the head of all divisions of this section. More %5detailed
presentations%1 follow immediately as subsections which have
%5headings in lower-case italicized type (as here printed)%1. The
reader who first desires to get an overview of our work, before
becoming involved with the %5details%1, can use this format as a guide.
.END
.GROUP SKIP 2
.SELECT B
.BEGIN CENTER
A. SIMULATION OF MUSIC INSTRUMENT TONES
.END
.SELECT 1
.BEGIN FILL ADJUST
In this part of the proposal we will discuss our approaches to the
computer simulation of music instrument tones. The main goal of our
research is the development of a powerful, general purpose technique
for the simulation of auditory signals that will have the perceptual
complexity and naturalness of the musical sounds which occur in the
real world. The fundamental concern here is with the synthesis of
natural timbres of the extremely varied and highly complex tones
which occur in music. The definition of timbre most accepted in the
literature of auditory theory is that stated by the American
Standards Association (1960): "Timbre is that attribute of auditory
sensation in terms of which a listener can judge that two sounds
similarly presented and having the same loudness and pitch are
dissimilar." It is added that: "Timbre depends primarily upon the
spectrum of the stimulus, but it also depends upon the waveform, the
sound pressure, the frequency location of the spectrum, and the
temporal characteristics of the stimulus."
The immediate problem we face in designing a successful algorithm for
the computer simulation of music instrument tones involves the exact
nature of the psychophysical relationships in timbre, that is, the
relationships between the physical properties of sounds and the
subjective, psychological qualities by which they are perceptually
differentiated when presented at the same loudness and pitch. We
have found that there has been relatively little work done in the
last century of auditory research investigating the psychophysical
relationships in timbre perception. Even of that which has been done,
most of the researchers have held such restrictive definitions of
timbre that their findings are entirely useless for application in
our present attempt to design simulation algorithms. This should be
evident if we examine the real lack of clarity shown in the
definition given in the last paragraph, where timbre is most
precisely pinpointed by a negative formulation: whatever is not pitch
and not loudness (and, we might add, not duration and not location)
is `timbre'. Of the extremely complex acoustical left-overs,
researchers have mainly focussed on the (so-called) `steady-state'
spectrum of stimuli as the dominant, if not exclusive, factor in
timbre.
It is becoming clear that this is only one of many factors in timbre
- if indeed there even is an actual `steady-state' in real tones,
that is, a duration of stability in which the amplitudes of the
harmonic components of a tone remain constant. From the results of
recent analyses of real tones, as well as attempts to synthesize
natural-sounding tones, it appears that real tones have complex and
ever-changing physical properties, and that the nature of these
temporal changes is most probably an extremely important factor in
timbre (Luce, 1963; Risset, 1966; Strong & Clark, 1967a, 1967b;
Freedman, 1967, 1968). The vagueness of the above-cited definition
seems to accurately represent the actual state of knowledge to date
concerning timbre perception, and to ask for a more precise and
useful statement demands fundamental research which is yet to be
done. We are now in the position to accomplish such research, given
the possibilities presented by the digital computer.
Supporting our goal to develop a technique for the computer
simulation of natural tones, therefore, is a concurrent research
effort to formulate a model which is able to describe the
perception of musical sounds. The theoretical problem is to establish
a set of acoustical dimensions, those which are actually salient in
the perception of musical timbre, and to design a computational
algorithm that enables the user to exert the most direct control with
respect to these dimensions for the purposes of simulation. The
empirical problem which follows is the determination of those aspects
of the signal which are actually important in the perception and
identification of a sound. Necessarily included is a study of the
distinctive features of signals, the investigation of physical
conditions which contribute to the naturalness of a signal. Related
research should examine the general characteristics of timbre
perception, looking into the effects of such phenomena as the
categorical identification of musical sounds.
.NEXT PAGE
The discussion which follows is concerned with a description of our
systematic approach to a general model for simulation, which is based
on perceptual verification at every step. The two strategies which
we have used for simulation are additive synthesis which is based on
the analysis of real tones, and frequency-modulation synthesis. The
first method discussed below, additive synthesis based on analysis,
presents the goal for research of data reduction. We start with the
most complete, complex information about a real signal given through
its analysis. We then systematically step in the direction of the
most simple representation of the signal which can be used
successfully to reproduce the original tone by additive synthesis.
To this end we examine the perceptually important aspects of physical
signals.
.END
.BEGIN FILL ADJUST
The second approach towards a model for simulation discussed below
begins with a simpler, more easily-controlled process, frequency
modulation synthesis of sound. This technique allows the user to
directly manipulate aspects of the signal that we subsequently found
to be very meaningful in terms of certain perceptual cues for music
instrument tones. The success of this method first came as a
surprise because the physical waveform that it generates is
strikingly different from that of any natural signal. However, upon
inspection, the reasons for this success have been determined, and
we thereby began to learn what physical dimensions are perceptually
important for timbre. The direction of this approach is to increase
the complexity of the synthesis process, until there is control over
a very wide set of features which occur in natural tones.
Following the detailed description of these two methods is a third
section devoted to a discussion of the ultimate aim of our research:
the development of powerful, general-purpose algorithms for the
computer simulation of natural tones. This more general algorithm
will be an outgrowth of the interdependency and convergence of our
two approaches to synthesis. The two approaches do not proceed
independently of one another, but interact at several levels.
Findings in one technique can immediately be applied to the other,
and a system for cross-verification is thereby established. In this
way, a convergence of these two methods is approached. The
simplicity and perceptual meaningfulness of specifications to the
frequency modulation technique points out an important goal for the
additive synthesis method. On the other hand, the complexities of
tone which are revealed by analysis, and which are confirmed to be
perceptually salient in the additive synthesis, point out necessary
levels of complexity which must be accomodated by the frequency
modulation technique. As the latter technique is then made more
complex, it in fact enters the category of additive synthesis. The
ultimate model for simulation will draw from the research findings
using both methods.
.NEXT PAGE
A common aspect of research with both methods is the concern for
perceptual verification of any particular results at hand.
Experimental methods from perceptual psychology are employed for the
rigorous verification of the success of simulation, in terms of the
discriminability of a synthesized from real tones and in terms of the
naturalness of simulation. In addition, to assist in the development
of a general algorithm, we will have to formulate a general model for
the perception of timbre. This general model will provide
information for the construction of perceptually-based higher-order
simulation algorithms. We employ a spatial model for the subjective
structure of the perceptual relationships between signals. Research
is directed at uncovering the dimensionality of the subjective space,
the psychophysical relationships which are structurally correlated to
this space, and the properties of the space. The existence of such
constraints as categorical boundaries will be investigated in an
attempt to assess the continuity of the subjective space for timbre.
In the same regard, we will also examine the effects of musical
training or context on the structure of the space. The model will be
evaluated by our ability to predict the mappings of real and novel
tones.
.END
.GROUP SKIP 2
.SELECT A
1. ADDITIVE SYNTHESIS BASED ON THE ANALYSIS OF REAL TONES
.GROUP SKIP 1
.SELECT C
INTRODUCTION TO SYNTHESIS AND ANALYSIS TECHNIQUES
.SELECT 1
.BEGIN FILL ADJUST
This section will introduce our approach to the simulation of music
instrument tones using additive synthesis based on the analysis of
tones from actual instruments. Additive synthesis considers a complex
sound to be the sum of a set of sinusoidal components, or harmonics.
A basic presentation of synthesis and analysis techniques will
follow. These are based on computer processes that analyze a real
tone, which has been recorded and digitized, into time-varying
frequency and amplitude functions for each of its harmonics. A
concrete example of this is given in Figure 1 for the first four
harmonics of a violin tone. Given the results of analysis, we can
then reproduce the tone by additive synthesis, where the set of
sinusoidal components are controlled in amplitude and frequency by
the analyzed functions, and their outputs are added together to
constitute the complex music instrument tone. Various other methods
for displaying sets of amplitude and frequency functions are given in
Figures 2 through 4.
.NEXT PAGE
%5synthesis%1
In additive synthesis we physically model a complex sound waveform as
a sum of sinusoids with slowly time-varying amplitudes and phases.
The process of synthesis involves specifying the amplitude and phase
(equivalently, amplitude and frequency) for each component sinusoid
as it varies with time throughout the duration of the tone. We will
generally refer to this specification as being a time-varying
function, amplitude or frequency, for a component sinusoid.
These sinusoids are added together to produce the
complex waveform. Equation (1) summarizes this formulation.
.END
.GROUP
.SELECT 3
M
(1) F%8α%3 = %6S%3 A%8n%3 sin(%4w%8n%3αh+%4q%8n%3)
n=1
Notation: α is the sample number
h is the time between consecutive samples
F%8α%3 is the sampled, digitized waveform at time αh
A%8n%3 is the amplitude of the nth partial tone
and is assumed to be slowly varying with time
%4q%8n%3 is the phase of the nth partial tone
and is assumed to be slowly varying with time
%4w%8n%3 is the radian frequency of the nth partial tone
.SELECT 1
.APART
.BEGIN FILL ADJUST
One can see that from this model, if we can determine the functions
A%8n%1 and %4q%8n%1 of a tone from a musical instrument, we can then
synthesize an approximation to the waveform F%8α%1 from those
functions by use of equation (1). The degree to which this form of
synthesis has been successful will be discussed below. To determine
the functions A%8n%1 and %4q%8n%1 of a music instrument tone, we must
assume the frequencies of the partial tones, %4w%8n%1, are nearly
harmonically related. By harmonically related, we mean that the tone
has a fundamental frequency, %4w%1, and that the frequencies of all
the partials of the tone are integer multiples of the fundamental
frequency. That is, the frequency of the n%2th%1 partial, %4w%8n%1,
is approximately n%4w%1.
It should be pointed out that equation (1) could have been formulated
with time-varying frequencies and constant phases. This formulation
is equivalent and for all practical cases, the one can be derived
from the other. We will speak interchangeably of the `phases' of the
harmonics and the `frequencies' of the harmonics as a function of
time. In the context of the analysis of tones, it is most natural to
produce the phases of the harmonics as functions of time, as is shown
in Appendix A. For intuitive purposes, however, it is more
instructive to view displays of the frequencies of the harmonics as
functions of time, and we will therefore usually refer to the
frequency (rather than phase) functions of harmonics. In Figure 1 we
present an example of a set of time-varying amplitude and frequency
functions for four component sinusoids of a tone. We would use these
functions in additive synthesis to control the amplitudes and
frequencies of four components of a complex tone. In fact, they
would constitute the first four harmonics of the tone, their average
frequencies being approximately 308 Hz, 616 Hz, 924 Hz, and 1232 Hz,
respectively. We should note that these functions were actually the
result of the computer-analysis of a real tone, which was tape
recorded and then digitized.
.GROUP SKIP 2
%5analysis for additive synthesis and graphic techniques%1
The method we have found most useful for analysis we call the
`heterodyne filter.' This is described in detail in Moorer (1973) and
is derived briefly in Appendix A. We take the digitized waveform of a
single music instrument tone and for each harmonic under analysis, we
perform the following operations: We form the products of the
digitized waveform with a sine and cosine at the frequency of that
harmonic and compute the average of each product over one period of
the fundamental frequency of the tone. The square root of the sum of
the squares of these two averages is an approximation to the
amplitude of the harmonic in question at that point in time. The
inverse tangent of the ratio of these two averages is an
approximation to the phase of the harmonic in question at that point
in time. We repeat this process throughout the duration of the note
for all harmonics.
.END
.BEGIN FILL ADJUST
As aids to the researcher, we have designed several different methods
for displaying the results of analysis. The output of the heterodyne
filter can, of course, be displayed as a number of isolated amplitude
and frequency functions, covering the individual components, as is
shown in Figure 1 for the first four harmonics of a violin tone. The
total duration of the tone is about 400 milliseconds and its
fundamental frequency is about 308 Hz. Sixteen harmonics were
actually analyzed for this tone, but we present the isolated plots
for only the first four of these in Figure 1. Three more pages of
such plots would cover the remaining harmonics, however, the first
page is sufficient to get a feeling for the sort of information which
can be obtained from this form of display.
.NEXT PAGE
To obtain a more easily-grasped picture of the relationships between
all harmonics of a tone, it has been found most informative to view
the entire set of harmonics together. One method designed for this
is the three-dimensional perspective plot. Figure 2 shows such a plot
of the amplitudes of all sixteen partials of the same violin tone.
The fundamental appears as the backmost function in the picture,
while the highest harmonic is represented as the frontmost function.
This form of display allows us to more readily discover relationships
among the harmonics. The perspective plot can be spatially rotated
on-line by the computer, so that the observer is able to see the
three-dimensional representation from any angle. This has been very
helpful in getting a more comprehensive understanding of the behavior
of the partials of a tone as a function of time.
Another form of display revealing the evolution of the partials of a
tone as a function of time is the sequential line-spectrum plot.
Here, we make use of animation techniques to present successive
moments in the tone, presenting a plot of the amplitudes of all the
harmonics at each moment in time. One plot is shown in Figure 3,
taken from the middle of the violin tone. This strictly on-line
display presents such two-dimensional frequency by amplitude plots of
the partials for successive instants in time, and the viewer can
follow the amplitude changes for the partials from the beginning to
the end of the tone.
A fourth way of examining the output of the heterodyne filter,
inspired by the conventional speech spectrograph, is given in Figure
4. The particular advantage of this form of display is that it
presents both frequency and amplitude information at once in a
concise plot, allowing us to view relationships between the two as
functions of time. Here, the thickness of each bar is proportional to
the log of the amplitude of that harmonic. The vertical position
represents its instantaneous frequency, as determined from the phase
drift of the harmonic. The utility of this display is its
representation of the phase information with respect to amplitudes.
.END
.GROUP SKIP 2
.SELECT C
CURRENT RESEARCH
.SELECT 1
.BEGIN FILL ADJUST
We begin the discussion of our current research with the results of
the perceptual evaluation of the analysis-synthesis strategy, a
necessary test of the usefulness of this approach in simulating
natural tones. Experienced listeners verified that the strategy of
additive synthesis based on the results of heterodyne analysis is
indeed capable of producing tones that are perceptually
indistinguishable from their respective original recorded and
digitized tones. The main function of the analysis-synthesis
technique is to provide a starting point for simulating signals which
retain the perceptually salient features of complex natural tones.
We begin with the most complete set of data on the signal which can
be provided by analysis. We are assured that additive synthesis based
on this highly detailed information will produce a signal which is
indistinguishable from the original (compare Examples 1 & 2, the digitized
and synthesized versions of the violin tone given in Figures 1 through 4).
Our goal is to determine which aspects of the simulated signal, and
therefore the original signal, are perceptually important. Some
attempts have been made in the past to reduce the highly complex
information derived from the analysis of real tones to just those
perceptually important aspects of the signal which are necessary for
its simulation (Risset, 1966; Strong & Clark, 1967a, 1967b). Our
current research involves an extension of this type of work. We will
first discuss the systematic modifications of tones which we have
performed in our work, namely the filtering of signals in order to
localize their most perceptually important components. The results
of synthesizing tones from selected components rather than the full
set of analyzed harmonics, a highly accurate form of `filtering', was
found to provide a means to study the important cues for the
identification of instruments.
We will then describe a main direction in our current research: the
simplification of the very complex data structure which is obtained
from the analysis of the physical properties of a real tone. This
process, generally referred to as data reduction, serves many of the
goals which we have set for an ideal simulation algorithm. We will
describe the remarkable success which we have had in using small
numbers of line segments to represent the time-varying amplitude and
frequency functions for the components of various tones. We will
mainly cite the results of studies on the violin as one concrete
example, and briefly mention related findings for other types of
instruments. A strikingly successful reduction of the complex data
obtained from the analysis of a violin tone (shown in Figures 1 & 2
and presented as Recorded Example 1) was a representation of each amplitude
and frequency function by only three line segments (shown in Figures
5 & 6 and Recorded Example 1).
.END
.GROUP SKIP 2
.SELECT 5
perceptual evaluation of analysis-synthesis strategy
.SELECT 1
.BEGIN FILL ADJUST
It is necessary to confirm the utility of our analysis-synthesis
strategy for the simulation of musical tones on the basis of the
perceptual success of the synthesized signal. That is to
say, we must establish that the signal which is produced by our
analysis-synthesis technique is indiscriminable from the digitized recording of
the original sound. The critical test, then, is a comparison of
the sound which has been produced by additive synthesis with the
original musical tone that was analyzed with the heterodyne filter.
Informal experimentation, in which experienced listeners compared the
original recorded tones, played directly back after digitization,
with the tones that were synthesized on the basis of analysis, has
shown that the analysis-synthesis method produces an extremely
convincing replication of the original signal. The strategy thereby
has been perceptually verified as being capable of reproducing natural
tones. This is in agreement with the findings of other investigators
who have attempted to verify similar analysis techniques by comparing
tones synthesized from analysis with the original signals (Risset,
1966; Freedman, 1967, 1968).
Among the types of natural musical signals which have been
successfully simulated by the analysis-synthesis procedure are tones
from the string, woodwind, and brass families of the orchestra.
Specifically, we have been able to reproduce tones of various pitches
and durations from the following instruments:
.SELECT C
.NARROW 10,10
violin, viola, cello, double bass, trumpet, trombone,
French horn, baritone horn, oboe, English horn, bassoon,
Bb clarinet, alto clarinet, bass clarinet,
flute, alto flute, alto sax, soprano sax.
.WIDEN
.SELECT 1
The comparisons between the original digitized tones and their
respective simulations by experienced listeners, musicians and
acousticians, have demonstrated the potential power of our
analysis-synthesis technique. The digitized and synthesized violin
tones (the analysis of which is shown in Figures 1 through 4) are
presented in Recorded Example 1.
.END
.GROUP SKIP 2
.SELECT 5
filtering of signals to localize perceptual cues
.SELECT 1
.BEGIN FILL ADJUST
One sort of modification which we have employed to reveal perceptual
cues for the identification of tones is that of filtering. Directly
given by our analysis-synthesis method is the power to precisely
select the harmonics which will be synthesized. We have applied this
modification to a number of signals, to make a preliminary evaluation
of its usefulness for localizing, in the frequency plane, perceptual
cues for the identification of a number of music instrument tones. As
we had expected, there is a fairly broad variation in the minimal
number of lower harmonics which are necessary to transmit the
identity of an instrument. This variation occurred even though many
of the tones started with the same number of harmonics. A close
study of the variation of identification with the low-pass cut-off
frequencies for the individual tones gave a rough estimate of the
location of various perceptual cues for these instruments. We
concluded from these preliminary tests that the selective filtering
of signals could indeed provide much significant information about
the relationships between the physical properties of sounds and their
perceived qualities. We propose further work below, but give here an
example of the results of this testing on the violin.
The violin tone examined is the one displayed in Figures 1 through 4.
A number of filtering operations were performed on this signal, and
we had experienced listeners, including several musicians, attempt to
identify which instrument had produced the tones. Identification of
this particular source was increasingly difficult for most listeners
in the low-pass filtering condition as the cut-off was reduced below
the tenth harmonic. At that point the source of the signal was not
definitely identified as a violin, but any of a set of
string-instruments. This contrasted to the results obtained with
various other instrument sources - some of these had an equal number
of analyzable components to start with - which could be correctly
identified with much lower cut-offs. For example, the clarinet,
which also started with 16 harmonics, could be identified from only
harmonics 1, 3, and 5. With the violin, however, the prominent
activity in seventh through eleventh harmonics, especially during the
attack, may be implicated to be of great perceptual importance. In a
high-pass filtering condition, it was found the identification of the
violin source was difficult when only the first three harmonics were
absent. This was found with most of the other signals tested
(however, prominent cues for certain brass instruments were
associated with patterns of modulation during their attack segments,
and these sources could often be identified from a single component
which displayed the modulation pattern). More complex selective
filtering strategies again confirmed the importance of the activity
of the seventh through eleventh harmonics for the identification of
the violin tone.
.END
.GROUP SKIP 2
.SELECT 5
data reduction
.SELECT 1
.BEGIN FILL ADJUST
A general strategy for data reduction is presently being persued. The
complete data is initially represented by 400 to 500 line-segments
per amplitude and frequency function per harmonic. (This, of course,
is a reduction of data from the 25,000 points per second directly
produced by the analysis, but does not significantly distort the
complexity of microstructure within the functions.) We are in the
process of determining the minimum number of line-segments per
function which will allow for a successful simulation of the original
signal. Data reduction will be found to depend on an empirical
verification of the perceptual fitness of measures taken: the success
or failure of a current data reduction strategy contributes to the
understanding of the salient features in the perception of musical
tones; this understanding, in turn, directs the next stage of data
reduction.
Reduction efforts have met with a surprising degree of success.
Simulations based on as few as three line-segments per amplitude and
frequency function have been indistinguishable from the original
signals by experienced listeners. The shapes of the reduced
functions are now empirically derived from the originally complex
curves. An example of a successful three line-segment data reduction
of the amplitude and frequency functions for the violin tone shown
earlier in Figures 1 & 2 (Recorded Example 1) is shown in
Figures 5 & 6 (also Recorded Example 1). This represents an enormous
step in data reduction, where, for example, the violin tone
referred to could be represented by a total of less than 200 numbers
rather than over 16,000 (assuming approximately 500 segments per
function). The resources of the computer are optimally used with
respect to the storage of parameters for synthesis, the number of
input-output operations required, and the size and complexity needed
in the program used for synthesis.
The success of these reductions also present a major discovery about
the perception of timbre, since the subtle micro-fluctuations which
occur in the physical parameters of these signals seem to have very
little perceptual importance, even in laboratory listening conditions
where tones are presented in temporal isolation for comparison. At
present, we are attempting to represent functions by two
line-segments, and have had encouraging results with the violin
tone, the only one tested so far. This instrument, we should note,
has been one of the most challenging sources for data reduction
attempts, and presents a good test of any particular technique.
Previously, an effort was made to employ constants, instead of
time-varying functions, for the frequencies of the components. This
was found to produce a noticeable change in the quality of the violin
tone, although several of the other signals tested suffered much less
discriminable alterations. The change was described by listeners as
a decrease in the strength of the attack of the signal. The tone was
still considered to retain the quality of naturalness, as
established from informal reports, including the response of an
experienced violin player. Our general conclusion from this
preliminary study was that time-varying frequency functions are
necessary to exactly replicate certain second-order features of
tones. This does not preclude the substitution of some other
physical manipulation for these features.
.END
.NEXT PAGE
.SELECT C
PROPOSED RESEARCH
.SELECT 1
.BEGIN FILL ADJUST
We will now turn to our proposed research, and begin by briefly
discussing the necessary extension of the range of timbres covered by
the additive synthesis technique. The eventual development of truly
general techniques are contingent upon this extension of cases
examined. We next describe our plans for a systematic exploration of
data reduction techniques, which include the rigorous testing of
particular methods by perceptual scaling experiments. We then
discuss a practical result from this exploration: the development of
automatic data reduction algorithms. Reduced data structures for the
physical attributes of music instrument tones provides the researcher
with a better tool to investigate the more general aspects of timbre
perception for whole sets of natural sources. This will be amplified
in a latter section, devoted to the applications of multidimensional
scaling techniques to timbre perception. We will here discuss the
higher-order algorithms which should result for additive synthesis
from the above research, algorithms which give the user perceptually
meaningful controls and which make optimal use of computer resources
for the simulation of tones.
.END
.GROUP SKIP 2
.SELECT 5
extension of timbral range
.SELECT 1
.BEGIN FILL ADJUST
A necessary step for the eventual development of truly general
simulation techniques is the application of our methods to an
extended set of sources. For this purpose, we are planning to
gather a large collection of tones from string, woodwind and brass
families of musical instruments. Notes at several durations, played
in different manners, will be recorded throughout the ranges of all
instruments in the above families. As our research progresses in
time, we will cover a broader base of signals. We will thereby
investigate the perception of a very diverse set of cases and be
guided to a more general system for simulation. In that the goal of
our endeavors is to develop a technique by which we can realistically
simulate any instrumental sound, having any specific characteristics
in any context that could occur in reality, our data base
necessarily will be extensive. The widening of this data base is an
important part of our future research, and an integral feature of all
phases of investigation that are presented below.
.END
.NEXT PAGE
.SELECT 5
systematic exploration of data reduction techniques
.SELECT 1
.BEGIN FILL ADJUST
A series of rigorous discrimination studies are planned, in which a
wide range of signals and reduction specifications will be
investigated. The basic approach consists of observing the
perceptual effects of systematic modifications and simplifications of
the data which directs synthesis. Listeners will attempt to
discriminate the original digitized tones, tones synthesized from
their complete analyses, and tones which have been significantly
simplified in their parametric data. The results of this testing
will give us the strongest evidence for those aspects of the signal
which are important to perception and those which are insignificant
and need not be present in a simulation, with respect to a
representative population of listeners with varying degrees of
musical training.
The discriminability of signals is a standard perceptual measurement.
The experimental procedure employed for the measurement of
discriminability will involve the judgment of `same' or `different'
for a pair of tones by the listener. The experiment is completely
controlled by computer: pairs of tones are randomly selected from the
stimulus set and played to the listener; his response is tabulated
and the data is analyzed. It should be noted that the computation to
synthesize the tones is done beforehand, and the digital waveform
representing the tone is stored on the bulk-storage disk. On
completion of the computation, the tones are played by the computer
through the digital-to-analog converter. The only analog equipment
used is the standard audio system.
In that we are concerned with the simulation of signals which are
highly realistic to the listener, we will be interested in measuring
the `naturalness' of simulations for listeners, who have varying degrees
of musical training. We will test tones both in temporal isolation
and in complex sequences, to determine the relative effects on the
evaluation of naturalness for tones induced by the context in which
they are presented. We realize that this measurement could be
subject to much variability, and we feel that it is important to
carefully examine factors which might be correlated with this
variability, such as the background of the listener and the context
of the signals. It will be important to evaluate the adequacy of
simulations with respect to these factors, if the techniques that we
develop are to obtain generality. Experiments will have listeners
apply an N-point rating scale of relative `naturalness' to a
particular set of tones, which will include the digitized real tones
and discriminable simulations, some which are the results of
drastically simplified methods, such as fixed-waveform synthesis
where spectral dynamics are absent.
.NEXT PAGE
Our preliminary findings suggest a general success with as few as
three line-segments per control function. Even if this vastly
simplified representation of natural signals turns out to be the
limiting case, rapid progress can be made in understanding the
psychophysical relationships in their perception and identification.
The investigation of these relationships, between the subjective,
perceived qualities of tones and their physical properties, will be
vastly facilitated by the simplified representation of their physical
properties. The importance of relative slopes of attack and onset
times of components, the ranges of variation permissible for
spectral levels, and the necessity to exactly preserve various other
overall characteristics of the analyzed functions for each harmonic
will be closely studied.
.END
.GROUP SKIP 2
.SELECT 5
automatic data reduction algorithms
.SELECT 1
.BEGIN FILL ADJUST
As we examine a broader range of signals, we will be able to design
algorithms for the automatic reduction of the data from analysis
which will replace our initial empirical method of reduction. The
first step will be the automation of the process of
line-segment fitting of complex time-variant amplitude and frequency
functions. The optimal type of fitting procedure, and the range of
permissible variability in the reduction, will have been established
in the research outlined above for a variety of signals. More
sophisticated routines for data reduction will draw from related
research on the perception of timbre which is described below.
.END
.GROUP SKIP 2
.SELECT 5
higher-order algorithms
.SELECT 1
.BEGIN FILL ADJUST
The systematic exploration of the relationships between the known
physical properties of tones and their perceptual correlates will
reveal the salient cues for their identification, hence the necessary
features for their simulation. Perceptual scaling experiments
described below are designed to uncover the dimensions and properties
of the subjective space for timbre. With this information we will be
able to begin to approach a general model for the auditory processing
of complex natural signals. We will also benefit by developing a
successful set of strategies for data reduction for the computer
simulation of these signals. At this point we will be able to
investigate the more central aspects of auditory information
processing, the internal representations of complex natural stimuli
and the perception of these stimuli in complex temporal contexts.
This information will lead to very powerful computer simulation
techniques able to produce realistic sounds in highly complex
realistic contexts.
Higher-order simulation techniques will be a direct product of the
research with additive synthesis in conjunction with findings from
the frequency modulation approach described next. A higher-order
simulation algorithm will simultaneously provide the user with a
powerful level of control over salient aspects of tone while reducing
and making more experientially relevant the type of input
specifications to the simulation procedure. As we
come to understand the perceptually important features of tone, the
simulation algorithm will reflect this understanding. Features like
the relative attack slopes and onset times of components, expressed
in simplified graphical-relational form, such as the overall
evolution of the bandwidth of energy distribution of the signal
through time, will come to be directly dealt with by the user. Other
possible important features of tone, e.g. the modulation of
functions in amplitude or frequency or the existence of bandwidths of
noise, will be controlled via meaningfully simple specifications by
the user.
.END
.GROUP SKIP 2